services/horizon: Fix ReapLookupTables query #4525

bartekn · 2022-08-09T13:27:24Z

This commit fixes the query added in ee063a7. The previous query was using limit ... offset ... which becomes slow for larger offset values. This is happening because (from docs[1]):

The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET might be inefficient.

I rewrote the query to use id field in tables in where id > .... Before each run, the new id offset is fetched from the table that is later returned to be used in the next cycle.

I also realized that rewriting count(*) in subqueries to 1 with limit 1 further improves the query performance. This is because instead of counting all rows in parent tables it just check if there are any rows in tables (what we need to determine if the row is orphaned or not). This allows running the reaper also for history_accounts and history_assets tables.

[1] https://www.postgresql.org/docs/current/queries-limit.html

services/horizon/internal/ingest/main.go

Shaptic

I have a general question about the way reaping works, as that isn't part of the codebase I have had much exposure to yet.

The docstring on ReapLookupTables states that it

removes rows from lookup tables which aren't used (orphaned), i.e. history entries for them were reaped

To me, the bigger picture of reaping entries has to do with ledger advancement, right? i.e. you set HISTORY_RETENTION_COUNT=x means data from x ledgers get kept around. If every table either (a) tracks the ledger a row applies to or (b) can somehow be joined to identify said ledger, isn't reaping then just a matter of DELETE FROM table WHERE ledgerSeq < latestIngestedLedgerSeq; for every table?

In a related vein, it'd be nice if this PR clarified what offsets means within the docstring of ReapLookupTables. (To be clear, I'm aware of their purpose thanks to the awesome description in #4518, but adding it inline to the code will be helpful I think!)

Shaptic · 2022-08-09T20:32:47Z

services/horizon/internal/db2/history/main.go

@@ -935,7 +957,7 @@ func constructReapLookupTablesQuery(table string, historyTables []tableObjectFie
 	for i, historyTable := range historyTables {
 		_, err = fmt.Fprintf(
 			&sb,
-			`(select count(*) from %s where %s = hcb.id) as c%d, `,
+			`(select 1 from %s where %s = hcb.id limit 1) as c%d, `,


very nice optimization 👍

Shaptic · 2022-08-09T20:33:44Z

services/horizon/internal/db2/history/main_test.go

+			"(select 1 from history_trades where base_account_id = hcb.id limit 1) as c2, "+
+			"(select 1 from history_trades where counter_account_id = hcb.id limit 1) as c3, "+
+			"(select 1 from history_transaction_participants where history_account_id = hcb.id limit 1) as c4, "+
+			"1 as cx from history_accounts hcb where id >= 0 order by id limit 10) as sub "+


are negative ids possible?

No, the 0 in id >=0 condition is the offset param and it's there just to ensure we move forward when iterating over the tables in batches. So for next cycle it will be id >=1000 if the ID after 1000 rows (batch size) is equal 1000.

Nit: Could probably clean this up with WITH clauses, and named sub-queries?

bartekn · 2022-08-10T08:14:42Z

To me, the bigger picture of reaping entries has to do with ledger advancement, right? i.e. you set HISTORY_RETENTION_COUNT=x means data from x ledgers get kept around. If every table either (a) tracks the ledger a row applies to or (b) can somehow be joined to identify said ledger, isn't reaping then just a matter of DELETE FROM table WHERE ledgerSeq < latestIngestedLedgerSeq; for every table?

This is exactly what services/horizon/internal/reap does: it removes all rows from history tables which contains transactions, operations, effects, etc. before a specified ledger (current sequence — retention count). However, we also have lookup tables that denormalize history tables to save space. They look like [id, name] where name is account ID in history_accounts or claimable balance ID in history_claimable_balances. And then when used in history tables like history_operation_participants we can use an integer id instead of a long account ID.

The obvious problem with lookup tables is that a given ID can be used in multiple ledger ranges (depends on activity of an account or claimable balance). So we can't just simply remove rows from lookup tables while removing rows from normal history tables. Second, lookup tables are actively used by ingestion so if reaper removes a lookup row after ingestion loaded it it can cause inconsistencies in the DB.

…into fix-reap-lookup-tables-query

bartekn · 2022-08-10T12:51:56Z

@stellar/horizon-committers I added two more things, PTAL:

8c2e40c Logging deleted rows counter (only when any rows removed).
0317d5a Activates lookup reaping feature only when --history-retention-count is set.

bartekn added 5 commits August 9, 2022 15:27

services/horizon: Fix ReapLookupTables query

63c4e8d

Find new offsets before running a query

fde02a6

Improve query

12ef5b9

Improve query even more

afaf1e9

Fix test

9b1cf59

bartekn marked this pull request as ready for review August 9, 2022 14:54

bartekn requested a review from a team August 9, 2022 14:54

bartekn added 2 commits August 9, 2022 16:58

Remove foreign constraints

25267c2

Remove history_assets

0b3b8d9

2opremio reviewed Aug 9, 2022

View reviewed changes

services/horizon/internal/ingest/main.go Outdated Show resolved Hide resolved

Decrease batchSize

98e68be

Shaptic reviewed Aug 9, 2022

View reviewed changes

bartekn added 4 commits August 10, 2022 10:43

Decrease batchSize

a4b6172

Merge branch 'master' into fix-reap-lookup-tables-query

964fc71

Merge branch 'fix-reap-lookup-tables-query' of github.com:bartekn/go …

112c742

…into fix-reap-lookup-tables-query

Remove debug log

d32a2fb

paulbellamy approved these changes Aug 10, 2022

View reviewed changes

bartekn added 2 commits August 10, 2022 13:51

Add deleted rows count

8c2e40c

Activate reaping only when --history-retention-count enabled

0317d5a

Shaptic approved these changes Aug 10, 2022

View reviewed changes

bartekn mentioned this pull request Aug 11, 2022

horizon: history_claimable_balances is not cleared out by the reaper. #4396

Closed

Merge branch 'master' into fix-reap-lookup-tables-query

ceba772

bartekn merged commit 69225cf into stellar:master Aug 11, 2022

bartekn deleted the fix-reap-lookup-tables-query branch August 11, 2022 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services/horizon: Fix ReapLookupTables query #4525

services/horizon: Fix ReapLookupTables query #4525

bartekn commented Aug 9, 2022 •

edited

Loading

Shaptic left a comment •

edited

Loading

Shaptic Aug 9, 2022

Shaptic Aug 9, 2022

bartekn Aug 10, 2022

paulbellamy Aug 10, 2022

bartekn commented Aug 10, 2022 •

edited

Loading

bartekn commented Aug 10, 2022

services/horizon: Fix ReapLookupTables query #4525

services/horizon: Fix ReapLookupTables query #4525

Conversation

bartekn commented Aug 9, 2022 • edited Loading

Shaptic left a comment • edited Loading

Choose a reason for hiding this comment

Shaptic Aug 9, 2022

Choose a reason for hiding this comment

Shaptic Aug 9, 2022

Choose a reason for hiding this comment

bartekn Aug 10, 2022

Choose a reason for hiding this comment

paulbellamy Aug 10, 2022

Choose a reason for hiding this comment

bartekn commented Aug 10, 2022 • edited Loading

bartekn commented Aug 10, 2022

bartekn commented Aug 9, 2022 •

edited

Loading

Shaptic left a comment •

edited

Loading

bartekn commented Aug 10, 2022 •

edited

Loading